Mining Spatio-temporal Data on Industrialization from Historical Registries
نویسندگان
چکیده
Despite the growing availability of big data in many fields, historical data on socioevironmental phenomena are often not available due to a lack of automated and scalable approaches for collecting, digitizing, and assembling them. We have developed a data-mining method for extracting tabulated, geocoded data from printed directories. While scanning and optical character recognition (OCR) can digitize printed text, these methods alone do not capture the structure of the underlying data. Our pipeline integrates both page layout analysis and OCR to extract tabular, geocoded data from structured text. We demonstrate the utility of this method by applying it to scanned manufacturing registries from Rhode Island that record 41 years of industrial land use. The resulting spatio-temporal data can be used for socioenvironmental analyses of industrialization at a resolution that was not previously possible. In particular, we find strong evidence for the dispersion of manufacturing from the urban core of Providence, the state’s capital, along the Interstate 95 corridor to the north and south.
منابع مشابه
Mining Spatio-Temporal Patterns in Trajectory Data
Spatio-temporal patterns extracted from historical trajectories of moving objects reveal important knowledge about movement behavior for high quality LBS services. Existing approaches transform trajectories into sequences of location symbols and derive frequent subsequences by applying conventional sequential pattern mining algorithms. However, spatio-temporal correlations may be lost due to th...
متن کاملSpatio-Temporal Variation of Suspended Sediment Concentration at Downstream of a Sand Mine
The growing population led to greater human need to use natural resources such as sand and gravel mines. Direct removal of sands from the bed river leads to increase suspended sediment concentrations in downstream of harvested area and creates other problems viz. filling reservoirs, change in hydraulic characteristics of the channel and environmental damages. However, the range of temporal and ...
متن کاملContext-aware Modeling for Spatio-temporal Data Transmitted from a Wireless Body Sensor Network
Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model which takes the dynamic nature of a context-aware system into consideration. This model is con...
متن کاملMining Trajectory Patterns by Incorporating Temporal Properties
Spatio-temporal patterns extracted from historical trajectories of moving objects unveil important knowledge about movement behavior for high quality LBS services. Existing approaches transform trajectories into sequences of regional symbols and discover frequent subsequences by applying conventional sequential pattern mining algorithms. However, spatio-temporal correlations in the original dat...
متن کاملSpatio-Temporal Data Mining: A Survey of Problems and Methods
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1612.00992 شماره
صفحات -
تاریخ انتشار 2016